{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3 - Apache Spark ML- Create machine learning models\n",
    "\n",
    "In this chapter, you will:\n",
    "\n",
    "• Build your first ML model with Spark ML\n",
    "\n",
    "• Learn more Spark functionality and how to use it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyspark.sql import SparkSession \n",
    "\n",
    "spark = SparkSession.builder \\\n",
    "    .master('local[*]') \\\n",
    "    .appName(\"ApacheSparkML\") \\\n",
    "    .getOrCreate()\n",
    "\n",
    "df_train = spark.read.parquet(\"classified_train_data\")\n",
    "df_test = spark.read.parquet(\"classified_test_data\")\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>location</th>\n",
       "      <th>description</th>\n",
       "      <th>url</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>friends_count</th>\n",
       "      <th>listed_count</th>\n",
       "      <th>created_at</th>\n",
       "      <th>favourites_count</th>\n",
       "      <th>verified</th>\n",
       "      <th>statuses_count</th>\n",
       "      <th>lang</th>\n",
       "      <th>status</th>\n",
       "      <th>default_profile</th>\n",
       "      <th>default_profile_image</th>\n",
       "      <th>has_extended_profile</th>\n",
       "      <th>name</th>\n",
       "      <th>bot</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[, I, was, born, ysterday, tomorrow, wasnt, bor]</td>\n",
       "      <td>None</td>\n",
       "      <td>0</td>\n",
       "      <td>60</td>\n",
       "      <td>0</td>\n",
       "      <td>Sun Feb 19 03:47:42 +0000 2017</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "      <td>en</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[, avi, Robot, i, that, little, Your, Now, lif...</td>\n",
       "      <td>https://t.co/qGCGttmwIw</td>\n",
       "      <td>5388</td>\n",
       "      <td>0</td>\n",
       "      <td>178</td>\n",
       "      <td>Thu Oct 23 17:37:54 +0000 2014</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31751</td>\n",
       "      <td>en</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[, by, @inky, JÌ_rgen, Sebastian, composer, of...</td>\n",
       "      <td>http://t.co/PdagJGqVMR</td>\n",
       "      <td>207</td>\n",
       "      <td>8</td>\n",
       "      <td>21</td>\n",
       "      <td>Fri Jul 11 23:18:41 +0000 2014</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5591</td>\n",
       "      <td>en-gb</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[, by, @inky, a, https:tcoMJyd6NkYaf, bot, sou...</td>\n",
       "      <td>http://t.co/PdagJGIwEp</td>\n",
       "      <td>93</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "      <td>Tue May 26 18:23:01 +0000 2015</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5145</td>\n",
       "      <td>en</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[, by, at, @mattlaschneider, host, a, my, serv...</td>\n",
       "      <td>None</td>\n",
       "      <td>183</td>\n",
       "      <td>1</td>\n",
       "      <td>16</td>\n",
       "      <td>Thu May 19 17:00:07 +0000 2016</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3511</td>\n",
       "      <td>en</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   screen_name  location                                        description  \\\n",
       "0            1         0   [, I, was, born, ysterday, tomorrow, wasnt, bor]   \n",
       "1            1         0  [, avi, Robot, i, that, little, Your, Now, lif...   \n",
       "2            1         0  [, by, @inky, JÌ_rgen, Sebastian, composer, of...   \n",
       "3            1         0  [, by, @inky, a, https:tcoMJyd6NkYaf, bot, sou...   \n",
       "4            1         0  [, by, at, @mattlaschneider, host, a, my, serv...   \n",
       "\n",
       "                       url  followers_count  friends_count  listed_count  \\\n",
       "0                     None                0             60             0   \n",
       "1  https://t.co/qGCGttmwIw             5388              0           178   \n",
       "2   http://t.co/PdagJGqVMR              207              8            21   \n",
       "3   http://t.co/PdagJGIwEp               93              0            24   \n",
       "4                     None              183              1            16   \n",
       "\n",
       "                       created_at  favourites_count  verified  statuses_count  \\\n",
       "0  Sun Feb 19 03:47:42 +0000 2017                 0         0              10   \n",
       "1  Thu Oct 23 17:37:54 +0000 2014                 0         0           31751   \n",
       "2  Fri Jul 11 23:18:41 +0000 2014                 0         0            5591   \n",
       "3  Tue May 26 18:23:01 +0000 2015                 0         0            5145   \n",
       "4  Thu May 19 17:00:07 +0000 2016                 1         0            3511   \n",
       "\n",
       "    lang  status  default_profile default_profile_image has_extended_profile  \\\n",
       "0     en       1                0                  None                 None   \n",
       "1     en       1                0                  None                 None   \n",
       "2  en-gb       1                0                  None                 None   \n",
       "3     en       1                0                  None                 None   \n",
       "4     en       1                0                  None                 None   \n",
       "\n",
       "   name   bot  \n",
       "0     1  None  \n",
       "1     1  None  \n",
       "2     1  None  \n",
       "3     1  None  \n",
       "4     1  None  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.limit(5) .toPandas ()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "✅ Task :\n",
    "\n",
    "### Frequent Patterns Growth algorithm - FPGrowth()\n",
    "\n",
    "Your data is clean and organized. It's your turn to create your **first** ML model with Spark.\n",
    "\n",
    "Run the next code; it runs the **Frequent Patterns Growth** algorithm to extract patterns if those exist.\n",
    "\n",
    "Tweak the minSupport and minCondifence.\n",
    "\n",
    "\n",
    "<details><summary>Hint</summary>\n",
    "<p>\n",
    "\n",
    "Change the minSupport and minCondifence to 0.1 and see what happens.\n",
    "\n",
    "</p>\n",
    "</details>\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+---------------+----+\n",
      "|          items|freq|\n",
      "+---------------+----+\n",
      "|           [My]|  18|\n",
      "|         [like]|  25|\n",
      "|           [it]|  48|\n",
      "|        [it, a]|  18|\n",
      "|          [for]| 190|\n",
      "|     [for, the]|  60|\n",
      "|[for, the, and]|  23|\n",
      "|        [for, ]|  19|\n",
      "|       [for, a]|  37|\n",
      "|      [for, in]|  42|\n",
      "|  [for, in, to]|  18|\n",
      "| [for, in, and]|  21|\n",
      "|      [for, of]|  41|\n",
      "| [for, of, and]|  20|\n",
      "|      [for, by]|  40|\n",
      "|      [for, to]|  38|\n",
      "|     [for, and]|  63|\n",
      "|       [people]|  36|\n",
      "|           [so]|  20|\n",
      "|       [random]|  28|\n",
      "+---------------+----+\n",
      "only showing top 20 rows\n",
      "\n",
      "+-------------+----------+-------------------+------------------+\n",
      "|   antecedent|consequent|         confidence|              lift|\n",
      "+-------------+----------+-------------------+------------------+\n",
      "|  [tweets, a]|      [by]| 0.7692307692307693|3.9953379953379953|\n",
      "| [tweets, by]|       [a]|              0.625| 3.798758865248227|\n",
      "|[bot, a, the]|      [by]|               0.75| 3.895454545454545|\n",
      "|      [a, by]|     [the]|0.33663366336633666|1.7121367923142463|\n",
      "|      [a, by]|      [of]|0.25742574257425743|1.2606506364922208|\n",
      "|      [a, by]|     [and]| 0.1782178217821782|0.7617589689143477|\n",
      "|      [a, by]|    [from]|0.22772277227722773|3.2257589395303166|\n",
      "|      [a, by]|    [with]|0.22772277227722773| 3.982824813093554|\n",
      "|      [a, by]|        []|0.25742574257425743|1.8855885588558856|\n",
      "|      [a, by]|     [bot]| 0.5445544554455446| 5.185368536853686|\n",
      "|      [a, by]|    [that]| 0.2079207920792079| 4.568926123381568|\n",
      "|      [a, by]|       [I]|0.26732673267326734| 2.695282469423413|\n",
      "|      [a, by]|  [tweets]|0.19801980198019803| 6.788118811881188|\n",
      "|      [a, by]|     [day]|0.21782178217821782| 9.824908806670141|\n",
      "|      [a, by]|      [to]|0.24752475247524752|1.8286958006145442|\n",
      "|      [a, by]|     [Bot]| 0.1782178217821782| 6.233986663972519|\n",
      "|    [day, by]|       [a]| 0.8148148148148148| 4.952456002101392|\n",
      "|    [in, and]|     [for]| 0.2876712328767123| 2.595097332372026|\n",
      "|    [in, and]|     [the]| 0.4246575342465753|2.1598309011828785|\n",
      "|    [in, and]|      [of]| 0.3013698630136986|1.4758512720156556|\n",
      "+-------------+----------+-------------------+------------------+\n",
      "only showing top 20 rows\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>1704</th>\n",
       "      <th>1705</th>\n",
       "      <th>1706</th>\n",
       "      <th>1707</th>\n",
       "      <th>1708</th>\n",
       "      <th>1709</th>\n",
       "      <th>1710</th>\n",
       "      <th>1711</th>\n",
       "      <th>1712</th>\n",
       "      <th>1713</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>screen_name</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>location</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>description</th>\n",
       "      <td>[, I, was, born, ysterday, tomorrow, wasnt, bor]</td>\n",
       "      <td>[, avi, Robot, i, that, little, Your, Now, lif...</td>\n",
       "      <td>[, by, @inky, JÌ_rgen, Sebastian, composer, of...</td>\n",
       "      <td>[, by, @inky, a, https:tcoMJyd6NkYaf, bot, sou...</td>\n",
       "      <td>[, by, at, @mattlaschneider, host, a, my, serv...</td>\n",
       "      <td>[, insta:, litivelli, litivelli_, snap:]</td>\n",
       "      <td>[, starting, @avoision, Twitter, posting, with...</td>\n",
       "      <td>[, your, improved?, to, own, it, New, @dbaker_...</td>\n",
       "      <td>[#botALLY, by, @czircon, Bot]</td>\n",
       "      <td>[10, Following, I, Im, Tweets, not, but, minut...</td>\n",
       "      <td>...</td>\n",
       "      <td>[guitar, I, coming, throat, album, play, in, t...</td>\n",
       "      <td>[healthy, @UMatterDontQuit, so, choices, I, @R...</td>\n",
       "      <td>[literatura, de, Troca, Livro, livros, livraria]</td>\n",
       "      <td>[naughty, \"\"you, bad, \"\"\"20, me, made, and]</td>\n",
       "      <td>[never, myself, depend, but, on, nobody]</td>\n",
       "      <td>[next?, What, @ThisIsFusion, John, bot, http:t...</td>\n",
       "      <td>[since, 1949, Sprezzy, but, messy]</td>\n",
       "      <td>[since, 1956, everywhere, cooks, Inspiring]</td>\n",
       "      <td>[technolust, yr, trust]</td>\n",
       "      <td>[where, Television, brought, belongs, -, home,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>url</th>\n",
       "      <td>None</td>\n",
       "      <td>https://t.co/qGCGttmwIw</td>\n",
       "      <td>http://t.co/PdagJGqVMR</td>\n",
       "      <td>http://t.co/PdagJGIwEp</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>http://t.co/wIjkIUKTLc</td>\n",
       "      <td>https://t.co/b6Jew2xZ5t</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>https://t.co/2SZUeRSMor</td>\n",
       "      <td>http://t.co/i0zlHsC1xM</td>\n",
       "      <td>None</td>\n",
       "      <td>call me a bitch im proud of it.\"\"\"\"\"</td>\n",
       "      <td>https://t.co/GHrUZJLiqw</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>http://t.co/QD54bIMDmT</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>followers_count</th>\n",
       "      <td>0</td>\n",
       "      <td>5388</td>\n",
       "      <td>207</td>\n",
       "      <td>93</td>\n",
       "      <td>183</td>\n",
       "      <td>302</td>\n",
       "      <td>136</td>\n",
       "      <td>134</td>\n",
       "      <td>26</td>\n",
       "      <td>67</td>\n",
       "      <td>...</td>\n",
       "      <td>50441</td>\n",
       "      <td>435871</td>\n",
       "      <td>24244</td>\n",
       "      <td>NaN</td>\n",
       "      <td>62647</td>\n",
       "      <td>1628</td>\n",
       "      <td>9</td>\n",
       "      <td>11</td>\n",
       "      <td>75</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>friends_count</th>\n",
       "      <td>60</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>249</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>197</td>\n",
       "      <td>2337</td>\n",
       "      <td>24852</td>\n",
       "      <td>1391</td>\n",
       "      <td>277</td>\n",
       "      <td>4</td>\n",
       "      <td>462</td>\n",
       "      <td>745</td>\n",
       "      <td>4</td>\n",
       "      <td>365</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>listed_count</th>\n",
       "      <td>0</td>\n",
       "      <td>178</td>\n",
       "      <td>21</td>\n",
       "      <td>24</td>\n",
       "      <td>16</td>\n",
       "      <td>4</td>\n",
       "      <td>21</td>\n",
       "      <td>17</td>\n",
       "      <td>6</td>\n",
       "      <td>19</td>\n",
       "      <td>...</td>\n",
       "      <td>894</td>\n",
       "      <td>4662</td>\n",
       "      <td>742</td>\n",
       "      <td>477</td>\n",
       "      <td>1197</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>created_at</th>\n",
       "      <td>Sun Feb 19 03:47:42 +0000 2017</td>\n",
       "      <td>Thu Oct 23 17:37:54 +0000 2014</td>\n",
       "      <td>Fri Jul 11 23:18:41 +0000 2014</td>\n",
       "      <td>Tue May 26 18:23:01 +0000 2015</td>\n",
       "      <td>Thu May 19 17:00:07 +0000 2016</td>\n",
       "      <td>8/6/2009 18:16</td>\n",
       "      <td>Sat Jan 17 01:58:53 +0000 2015</td>\n",
       "      <td>Sat Jul 05 12:25:08 +0000 2014</td>\n",
       "      <td>Fri Apr 08 19:19:00 +0000 2016</td>\n",
       "      <td>7/22/2014 3:42</td>\n",
       "      <td>...</td>\n",
       "      <td>Thu Mar 12 06:24:54 +0000 2009</td>\n",
       "      <td>Wed Feb 04 20:07:27 +0000 2009</td>\n",
       "      <td>3/18/2009 15:09</td>\n",
       "      <td>108</td>\n",
       "      <td>Sun Aug 23 18:54:32 +0000 2009</td>\n",
       "      <td>Thu Aug 27 15:54:12 +0000 2015</td>\n",
       "      <td>10/29/2016 22:46</td>\n",
       "      <td>1/1/2015 17:44</td>\n",
       "      <td>Tue Sep 29 22:01:10 +0000 2015</td>\n",
       "      <td>1/30/2016 21:26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>favourites_count</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>953</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>2783</td>\n",
       "      <td>575</td>\n",
       "      <td>26</td>\n",
       "      <td>NaN</td>\n",
       "      <td>431</td>\n",
       "      <td>3</td>\n",
       "      <td>84</td>\n",
       "      <td>146</td>\n",
       "      <td>12</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>verified</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>statuses_count</th>\n",
       "      <td>10</td>\n",
       "      <td>31751</td>\n",
       "      <td>5591</td>\n",
       "      <td>5145</td>\n",
       "      <td>3511</td>\n",
       "      <td>17338</td>\n",
       "      <td>34446</td>\n",
       "      <td>7597</td>\n",
       "      <td>6832</td>\n",
       "      <td>120232</td>\n",
       "      <td>...</td>\n",
       "      <td>4652</td>\n",
       "      <td>27102</td>\n",
       "      <td>1568</td>\n",
       "      <td>NaN</td>\n",
       "      <td>11885</td>\n",
       "      <td>5155</td>\n",
       "      <td>72</td>\n",
       "      <td>185</td>\n",
       "      <td>1770</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lang</th>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en-gb</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>pt</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>pt</td>\n",
       "      <td>74779</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>status</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>default_profile</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>default_profile_image</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>has_extended_profile</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bot</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>prediction</th>\n",
       "      <td>[the, a, of, by, and, bot, to, in, from, am, w...</td>\n",
       "      <td>[the, of, and, with, I, day, to, Bot, for, A, ...</td>\n",
       "      <td>[a, the, for, from, is, and, I, to, in, bot, #...</td>\n",
       "      <td>[the, of, and, from, with, that, I, tweets, da...</td>\n",
       "      <td>[the, of, and, from, with, that, I, tweets, da...</td>\n",
       "      <td>[the, a, of, by, and, bot, I, to, in]</td>\n",
       "      <td>[a, from, for, the, of, and, to, that, A, #bot...</td>\n",
       "      <td>[a, bot, the, of, I, in, for, from, with, is, ...</td>\n",
       "      <td>[a, the, of, bot, for, from, with, , that, A, ...</td>\n",
       "      <td>[and, a, for, The, from, , is, to, in, by, bot...</td>\n",
       "      <td>...</td>\n",
       "      <td>[for, of, to, a, by, is, from, with, every, bo...</td>\n",
       "      <td>[a, in, of, to, and, from, am, with, that, you...</td>\n",
       "      <td>[]</td>\n",
       "      <td>[for, a, from, , is, of, bot, the, I, to, on, ...</td>\n",
       "      <td>[for, the, a, of, by, to, and]</td>\n",
       "      <td>[for, the, a, of, by, to, and, #botALLY, from,...</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>[]</td>\n",
       "      <td>[a, for, at, from, with, , is, bot, of, and, I...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>19 rows × 1714 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                    0     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description             [, I, was, born, ysterday, tomorrow, wasnt, bor]   \n",
       "url                                                                 None   \n",
       "followers_count                                                        0   \n",
       "friends_count                                                         60   \n",
       "listed_count                                                           0   \n",
       "created_at                                Sun Feb 19 03:47:42 +0000 2017   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                        10   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [the, a, of, by, and, bot, to, in, from, am, w...   \n",
       "\n",
       "                                                                    1     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description            [, avi, Robot, i, that, little, Your, Now, lif...   \n",
       "url                                              https://t.co/qGCGttmwIw   \n",
       "followers_count                                                     5388   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                         178   \n",
       "created_at                                Thu Oct 23 17:37:54 +0000 2014   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                     31751   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [the, of, and, with, I, day, to, Bot, for, A, ...   \n",
       "\n",
       "                                                                    2     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description            [, by, @inky, JÌ_rgen, Sebastian, composer, of...   \n",
       "url                                               http://t.co/PdagJGqVMR   \n",
       "followers_count                                                      207   \n",
       "friends_count                                                          8   \n",
       "listed_count                                                          21   \n",
       "created_at                                Fri Jul 11 23:18:41 +0000 2014   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      5591   \n",
       "lang                                                               en-gb   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [a, the, for, from, is, and, I, to, in, bot, #...   \n",
       "\n",
       "                                                                    3     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description            [, by, @inky, a, https:tcoMJyd6NkYaf, bot, sou...   \n",
       "url                                               http://t.co/PdagJGIwEp   \n",
       "followers_count                                                       93   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                          24   \n",
       "created_at                                Tue May 26 18:23:01 +0000 2015   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      5145   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [the, of, and, from, with, that, I, tweets, da...   \n",
       "\n",
       "                                                                    4     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description            [, by, at, @mattlaschneider, host, a, my, serv...   \n",
       "url                                                                 None   \n",
       "followers_count                                                      183   \n",
       "friends_count                                                          1   \n",
       "listed_count                                                          16   \n",
       "created_at                                Thu May 19 17:00:07 +0000 2016   \n",
       "favourites_count                                                       1   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      3511   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [the, of, and, from, with, that, I, tweets, da...   \n",
       "\n",
       "                                                           5     \\\n",
       "screen_name                                                   1   \n",
       "location                                                      0   \n",
       "description            [, insta:, litivelli, litivelli_, snap:]   \n",
       "url                                                        None   \n",
       "followers_count                                             302   \n",
       "friends_count                                               249   \n",
       "listed_count                                                  4   \n",
       "created_at                                       8/6/2009 18:16   \n",
       "favourites_count                                            953   \n",
       "verified                                                      0   \n",
       "statuses_count                                            17338   \n",
       "lang                                                         pt   \n",
       "status                                                        1   \n",
       "default_profile                                               0   \n",
       "default_profile_image                                     False   \n",
       "has_extended_profile                                      False   \n",
       "name                                                          1   \n",
       "bot                                                           0   \n",
       "prediction                [the, a, of, by, and, bot, I, to, in]   \n",
       "\n",
       "                                                                    6     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description            [, starting, @avoision, Twitter, posting, with...   \n",
       "url                                                                 None   \n",
       "followers_count                                                      136   \n",
       "friends_count                                                          1   \n",
       "listed_count                                                          21   \n",
       "created_at                                Sat Jan 17 01:58:53 +0000 2015   \n",
       "favourites_count                                                       5   \n",
       "verified                                                               0   \n",
       "statuses_count                                                     34446   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [a, from, for, the, of, and, to, that, A, #bot...   \n",
       "\n",
       "                                                                    7     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description            [, your, improved?, to, own, it, New, @dbaker_...   \n",
       "url                                               http://t.co/wIjkIUKTLc   \n",
       "followers_count                                                      134   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                          17   \n",
       "created_at                                Sat Jul 05 12:25:08 +0000 2014   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      7597   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [a, bot, the, of, I, in, for, from, with, is, ...   \n",
       "\n",
       "                                                                    8     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "description                                [#botALLY, by, @czircon, Bot]   \n",
       "url                                              https://t.co/b6Jew2xZ5t   \n",
       "followers_count                                                       26   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                           6   \n",
       "created_at                                Fri Apr 08 19:19:00 +0000 2016   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      6832   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [a, the, of, bot, for, from, with, , that, A, ...   \n",
       "\n",
       "                                                                    9     ...  \\\n",
       "screen_name                                                            1  ...   \n",
       "location                                                               0  ...   \n",
       "description            [10, Following, I, Im, Tweets, not, but, minut...  ...   \n",
       "url                                                                 None  ...   \n",
       "followers_count                                                       67  ...   \n",
       "friends_count                                                          0  ...   \n",
       "listed_count                                                          19  ...   \n",
       "created_at                                                7/22/2014 3:42  ...   \n",
       "favourites_count                                                       0  ...   \n",
       "verified                                                               0  ...   \n",
       "statuses_count                                                    120232  ...   \n",
       "lang                                                                  en  ...   \n",
       "status                                                                 1  ...   \n",
       "default_profile                                                        0  ...   \n",
       "default_profile_image                                               None  ...   \n",
       "has_extended_profile                                                None  ...   \n",
       "name                                                                   1  ...   \n",
       "bot                                                                  NaN  ...   \n",
       "prediction             [and, a, for, The, from, , is, to, in, by, bot...  ...   \n",
       "\n",
       "                                                                    1704  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "description            [guitar, I, coming, throat, album, play, in, t...   \n",
       "url                                              https://t.co/2SZUeRSMor   \n",
       "followers_count                                                    50441   \n",
       "friends_count                                                        197   \n",
       "listed_count                                                         894   \n",
       "created_at                                Thu Mar 12 06:24:54 +0000 2009   \n",
       "favourites_count                                                    2783   \n",
       "verified                                                               1   \n",
       "statuses_count                                                      4652   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [for, of, to, a, by, is, from, with, every, bo...   \n",
       "\n",
       "                                                                    1705  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "description            [healthy, @UMatterDontQuit, so, choices, I, @R...   \n",
       "url                                               http://t.co/i0zlHsC1xM   \n",
       "followers_count                                                   435871   \n",
       "friends_count                                                       2337   \n",
       "listed_count                                                        4662   \n",
       "created_at                                Wed Feb 04 20:07:27 +0000 2009   \n",
       "favourites_count                                                     575   \n",
       "verified                                                               1   \n",
       "statuses_count                                                     27102   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [a, in, of, to, and, from, am, with, that, you...   \n",
       "\n",
       "                                                                   1706  \\\n",
       "screen_name                                                           1   \n",
       "location                                                              1   \n",
       "description            [literatura, de, Troca, Livro, livros, livraria]   \n",
       "url                                                                None   \n",
       "followers_count                                                   24244   \n",
       "friends_count                                                     24852   \n",
       "listed_count                                                        742   \n",
       "created_at                                              3/18/2009 15:09   \n",
       "favourites_count                                                     26   \n",
       "verified                                                              0   \n",
       "statuses_count                                                     1568   \n",
       "lang                                                                 pt   \n",
       "status                                                                1   \n",
       "default_profile                                                       0   \n",
       "default_profile_image                                             False   \n",
       "has_extended_profile                                              False   \n",
       "name                                                                  1   \n",
       "bot                                                                   0   \n",
       "prediction                                                           []   \n",
       "\n",
       "                                                                    1707  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "description                  [naughty, \"\"you, bad, \"\"\"20, me, made, and]   \n",
       "url                                 call me a bitch im proud of it.\"\"\"\"\"   \n",
       "followers_count                                                      NaN   \n",
       "friends_count                                                       1391   \n",
       "listed_count                                                         477   \n",
       "created_at                                                           108   \n",
       "favourites_count                                                     NaN   \n",
       "verified                                                               0   \n",
       "statuses_count                                                       NaN   \n",
       "lang                                                               74779   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [for, a, from, , is, of, bot, the, I, to, on, ...   \n",
       "\n",
       "                                                           1708  \\\n",
       "screen_name                                                   1   \n",
       "location                                                      1   \n",
       "description            [never, myself, depend, but, on, nobody]   \n",
       "url                                     https://t.co/GHrUZJLiqw   \n",
       "followers_count                                           62647   \n",
       "friends_count                                               277   \n",
       "listed_count                                               1197   \n",
       "created_at                       Sun Aug 23 18:54:32 +0000 2009   \n",
       "favourites_count                                            431   \n",
       "verified                                                      1   \n",
       "statuses_count                                            11885   \n",
       "lang                                                         en   \n",
       "status                                                        1   \n",
       "default_profile                                               0   \n",
       "default_profile_image                                      None   \n",
       "has_extended_profile                                       None   \n",
       "name                                                          1   \n",
       "bot                                                         NaN   \n",
       "prediction                       [for, the, a, of, by, to, and]   \n",
       "\n",
       "                                                                    1709  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "description            [next?, What, @ThisIsFusion, John, bot, http:t...   \n",
       "url                                                                 None   \n",
       "followers_count                                                     1628   \n",
       "friends_count                                                          4   \n",
       "listed_count                                                           7   \n",
       "created_at                                Thu Aug 27 15:54:12 +0000 2015   \n",
       "favourites_count                                                       3   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      5155   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "prediction             [for, the, a, of, by, to, and, #botALLY, from,...   \n",
       "\n",
       "                                                     1710  \\\n",
       "screen_name                                             1   \n",
       "location                                                1   \n",
       "description            [since, 1949, Sprezzy, but, messy]   \n",
       "url                                                  None   \n",
       "followers_count                                         9   \n",
       "friends_count                                         462   \n",
       "listed_count                                            0   \n",
       "created_at                               10/29/2016 22:46   \n",
       "favourites_count                                       84   \n",
       "verified                                                0   \n",
       "statuses_count                                         72   \n",
       "lang                                                   en   \n",
       "status                                                  1   \n",
       "default_profile                                         1   \n",
       "default_profile_image                               False   \n",
       "has_extended_profile                                False   \n",
       "name                                                    1   \n",
       "bot                                                     1   \n",
       "prediction                                             []   \n",
       "\n",
       "                                                              1711  \\\n",
       "screen_name                                                      1   \n",
       "location                                                         1   \n",
       "description            [since, 1956, everywhere, cooks, Inspiring]   \n",
       "url                                                           None   \n",
       "followers_count                                                 11   \n",
       "friends_count                                                  745   \n",
       "listed_count                                                     0   \n",
       "created_at                                          1/1/2015 17:44   \n",
       "favourites_count                                               146   \n",
       "verified                                                         0   \n",
       "statuses_count                                                 185   \n",
       "lang                                                            en   \n",
       "status                                                           1   \n",
       "default_profile                                                  0   \n",
       "default_profile_image                                        False   \n",
       "has_extended_profile                                         False   \n",
       "name                                                             1   \n",
       "bot                                                              1   \n",
       "prediction                                                      []   \n",
       "\n",
       "                                                 1712  \\\n",
       "screen_name                                         1   \n",
       "location                                            1   \n",
       "description                   [technolust, yr, trust]   \n",
       "url                            http://t.co/QD54bIMDmT   \n",
       "followers_count                                    75   \n",
       "friends_count                                       4   \n",
       "listed_count                                       12   \n",
       "created_at             Tue Sep 29 22:01:10 +0000 2015   \n",
       "favourites_count                                   12   \n",
       "verified                                            0   \n",
       "statuses_count                                   1770   \n",
       "lang                                               en   \n",
       "status                                              1   \n",
       "default_profile                                     0   \n",
       "default_profile_image                            None   \n",
       "has_extended_profile                             None   \n",
       "name                                                1   \n",
       "bot                                               NaN   \n",
       "prediction                                         []   \n",
       "\n",
       "                                                                    1713  \n",
       "screen_name                                                            1  \n",
       "location                                                               1  \n",
       "description            [where, Television, brought, belongs, -, home,...  \n",
       "url                                                                 None  \n",
       "followers_count                                                        1  \n",
       "friends_count                                                        365  \n",
       "listed_count                                                           0  \n",
       "created_at                                               1/30/2016 21:26  \n",
       "favourites_count                                                      29  \n",
       "verified                                                               0  \n",
       "statuses_count                                                        25  \n",
       "lang                                                                  en  \n",
       "status                                                                 1  \n",
       "default_profile                                                        1  \n",
       "default_profile_image                                              False  \n",
       "has_extended_profile                                               False  \n",
       "name                                                                   1  \n",
       "bot                                                                    1  \n",
       "prediction             [a, for, at, from, with, , is, bot, of, and, I...  \n",
       "\n",
       "[19 rows x 1714 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyspark.ml.fpm import FPGrowth\n",
    "\n",
    "fpGrowth = FPGrowth(itemsCol=\"description\", minSupport=0.01, minConfidence=0.09)\n",
    "fpGrowth_model = fpGrowth.fit(df_train)\n",
    "\n",
    "# Display frequent itemsets.\n",
    "fpGrowth_model.freqItemsets.show()\n",
    "\n",
    "# Display generated association rules.\n",
    "fpGrowth_model.associationRules.show()\n",
    "\n",
    "# transform examines the input items against all the association rules and summarize the\n",
    "# consequents as prediction\n",
    "fpGrowth_model.transform(df_train).toPandas().transpose()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What did you get?\n",
    "\n",
    "\n",
    "\n",
    "<details><summary>Check me after running the previous code multipale times</summary>\n",
    "<p>\n",
    "\n",
    "When tweaking the minSupport=0.21 and minConfidence=0.1\n",
    "\n",
    "You get:\n",
    "\n",
    "|items|freq|\n",
    "|----|:----:|\n",
    "|[and]| 389|\n",
    "\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "✅ **Task :**\n",
    "\n",
    "### LinearRegression functionality\n",
    "\n",
    "It's your turn to create your **second** ML model with Spark - Linear Regression.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "<details><summary>What is linear regression used for?</summary>\n",
    "<p>\n",
    "Linear regression is a common Statistical Data Analysis technique. It is used to determine the extent to which there is a linear relationship between a dependent variable and one or more independent variables.\n",
    "</p>\n",
    "</details>\n",
    "\n",
    "\n",
    "**But**, before jumping right into it, you should know:\n",
    "\n",
    "Spark ML Linear Regression **input** is:\n",
    "1. `label` of type Double - our classification\n",
    "2. `features` of type - `Vector[Double]` - Vector of Double, turn all columns into one column named features.\n",
    "Hence, you will transform all _numeric_ columns into one Vector.\n",
    "\n",
    "Leave the `description` out as it is not relevant for your next task.\n",
    "    \n",
    "For creating `features` column we use Vector Assembler\n",
    "``` python\n",
    "from pyspark.ml.feature import VectorAssembler\n",
    "vecAssembler = VectorAssembler(inputCols=[\"a\", \"b\", \"c\"], outputCol=\"features\",handleInvalid = \"skip\")\n",
    "new_df = vecAssembler.transform(dataFrame)\n",
    "new_df.show()\n",
    "\n",
    "```\n",
    "<details><summary><b>Did you know!</b></summary>\n",
    "<p>\n",
    "\n",
    "Vector has two types in Spark:\n",
    "    \n",
    "    1. Dense Vector\n",
    "    2. Sparse Vector\n",
    "    \n",
    "_Sparse vector_ is when you have many values in the vector as zero or null.\n",
    "\n",
    "_Dense vector_ is when most of the values in the vector are non zero or non-null.\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "🤔 **Question**\n",
    "\n",
    "Have you noticed `handleInvalid` param?  \n",
    "```python\n",
    "handleInvalid = \"skip\"\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "What happens if you remove it?\n",
    "\n",
    "Validate yourself with `vecAssembler.show()`\n",
    "\n",
    "Notice! You have a new DataFrame now. \n",
    "\n",
    "Remember to check yourself and work with the new DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>1627</th>\n",
       "      <th>1628</th>\n",
       "      <th>1629</th>\n",
       "      <th>1630</th>\n",
       "      <th>1631</th>\n",
       "      <th>1632</th>\n",
       "      <th>1633</th>\n",
       "      <th>1634</th>\n",
       "      <th>1635</th>\n",
       "      <th>1636</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>screen_name</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>location</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>url</th>\n",
       "      <td>None</td>\n",
       "      <td>https://t.co/qGCGttmwIw</td>\n",
       "      <td>http://t.co/PdagJGqVMR</td>\n",
       "      <td>http://t.co/PdagJGIwEp</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>http://t.co/wIjkIUKTLc</td>\n",
       "      <td>https://t.co/b6Jew2xZ5t</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>https://t.co/2SZUeRSMor</td>\n",
       "      <td>http://t.co/i0zlHsC1xM</td>\n",
       "      <td>None</td>\n",
       "      <td>https://t.co/GHrUZJLiqw</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>http://t.co/QD54bIMDmT</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>followers_count</th>\n",
       "      <td>0</td>\n",
       "      <td>5388</td>\n",
       "      <td>207</td>\n",
       "      <td>93</td>\n",
       "      <td>183</td>\n",
       "      <td>302</td>\n",
       "      <td>136</td>\n",
       "      <td>134</td>\n",
       "      <td>26</td>\n",
       "      <td>67</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>50441</td>\n",
       "      <td>435871</td>\n",
       "      <td>24244</td>\n",
       "      <td>62647</td>\n",
       "      <td>1628</td>\n",
       "      <td>9</td>\n",
       "      <td>11</td>\n",
       "      <td>75</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>friends_count</th>\n",
       "      <td>60</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>249</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>367</td>\n",
       "      <td>197</td>\n",
       "      <td>2337</td>\n",
       "      <td>24852</td>\n",
       "      <td>277</td>\n",
       "      <td>4</td>\n",
       "      <td>462</td>\n",
       "      <td>745</td>\n",
       "      <td>4</td>\n",
       "      <td>365</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>listed_count</th>\n",
       "      <td>0</td>\n",
       "      <td>178</td>\n",
       "      <td>21</td>\n",
       "      <td>24</td>\n",
       "      <td>16</td>\n",
       "      <td>4</td>\n",
       "      <td>21</td>\n",
       "      <td>17</td>\n",
       "      <td>6</td>\n",
       "      <td>19</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>894</td>\n",
       "      <td>4662</td>\n",
       "      <td>742</td>\n",
       "      <td>1197</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>created_at</th>\n",
       "      <td>Sun Feb 19 03:47:42 +0000 2017</td>\n",
       "      <td>Thu Oct 23 17:37:54 +0000 2014</td>\n",
       "      <td>Fri Jul 11 23:18:41 +0000 2014</td>\n",
       "      <td>Tue May 26 18:23:01 +0000 2015</td>\n",
       "      <td>Thu May 19 17:00:07 +0000 2016</td>\n",
       "      <td>8/6/2009 18:16</td>\n",
       "      <td>Sat Jan 17 01:58:53 +0000 2015</td>\n",
       "      <td>Sat Jul 05 12:25:08 +0000 2014</td>\n",
       "      <td>Fri Apr 08 19:19:00 +0000 2016</td>\n",
       "      <td>7/22/2014 3:42</td>\n",
       "      <td>...</td>\n",
       "      <td>1/31/2016 21:35</td>\n",
       "      <td>Thu Mar 12 06:24:54 +0000 2009</td>\n",
       "      <td>Wed Feb 04 20:07:27 +0000 2009</td>\n",
       "      <td>3/18/2009 15:09</td>\n",
       "      <td>Sun Aug 23 18:54:32 +0000 2009</td>\n",
       "      <td>Thu Aug 27 15:54:12 +0000 2015</td>\n",
       "      <td>10/29/2016 22:46</td>\n",
       "      <td>1/1/2015 17:44</td>\n",
       "      <td>Tue Sep 29 22:01:10 +0000 2015</td>\n",
       "      <td>1/30/2016 21:26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>favourites_count</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>953</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>29</td>\n",
       "      <td>2783</td>\n",
       "      <td>575</td>\n",
       "      <td>26</td>\n",
       "      <td>431</td>\n",
       "      <td>3</td>\n",
       "      <td>84</td>\n",
       "      <td>146</td>\n",
       "      <td>12</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>verified</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>statuses_count</th>\n",
       "      <td>10</td>\n",
       "      <td>31751</td>\n",
       "      <td>5591</td>\n",
       "      <td>5145</td>\n",
       "      <td>3511</td>\n",
       "      <td>17338</td>\n",
       "      <td>34446</td>\n",
       "      <td>7597</td>\n",
       "      <td>6832</td>\n",
       "      <td>120232</td>\n",
       "      <td>...</td>\n",
       "      <td>37</td>\n",
       "      <td>4652</td>\n",
       "      <td>27102</td>\n",
       "      <td>1568</td>\n",
       "      <td>11885</td>\n",
       "      <td>5155</td>\n",
       "      <td>72</td>\n",
       "      <td>185</td>\n",
       "      <td>1770</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lang</th>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en-gb</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>pt</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>pt</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "      <td>en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>status</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>default_profile</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>default_profile_image</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>has_extended_profile</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bot</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>features</th>\n",
       "      <td>(1.0, 0.0, 0.0, 60.0, 0.0, 0.0, 0.0, 10.0, 1.0...</td>\n",
       "      <td>(1.0, 0.0, 5388.0, 0.0, 178.0, 0.0, 0.0, 31751...</td>\n",
       "      <td>[1.0, 0.0, 207.0, 8.0, 21.0, 0.0, 0.0, 5591.0,...</td>\n",
       "      <td>(1.0, 0.0, 93.0, 0.0, 24.0, 0.0, 0.0, 5145.0, ...</td>\n",
       "      <td>[1.0, 0.0, 183.0, 1.0, 16.0, 1.0, 0.0, 3511.0,...</td>\n",
       "      <td>[1.0, 0.0, 302.0, 249.0, 4.0, 953.0, 0.0, 1733...</td>\n",
       "      <td>[1.0, 0.0, 136.0, 1.0, 21.0, 5.0, 0.0, 34446.0...</td>\n",
       "      <td>(1.0, 0.0, 134.0, 0.0, 17.0, 0.0, 0.0, 7597.0,...</td>\n",
       "      <td>(1.0, 0.0, 26.0, 0.0, 6.0, 0.0, 0.0, 6832.0, 1...</td>\n",
       "      <td>(1.0, 0.0, 67.0, 0.0, 19.0, 0.0, 0.0, 120232.0...</td>\n",
       "      <td>...</td>\n",
       "      <td>[1.0, 1.0, 1.0, 367.0, 0.0, 29.0, 0.0, 37.0, 1...</td>\n",
       "      <td>[1.0, 1.0, 50441.0, 197.0, 894.0, 2783.0, 1.0,...</td>\n",
       "      <td>[1.0, 1.0, 435871.0, 2337.0, 4662.0, 575.0, 1....</td>\n",
       "      <td>[1.0, 1.0, 24244.0, 24852.0, 742.0, 26.0, 0.0,...</td>\n",
       "      <td>[1.0, 1.0, 62647.0, 277.0, 1197.0, 431.0, 1.0,...</td>\n",
       "      <td>[1.0, 1.0, 1628.0, 4.0, 7.0, 3.0, 0.0, 5155.0,...</td>\n",
       "      <td>[1.0, 1.0, 9.0, 462.0, 0.0, 84.0, 0.0, 72.0, 1...</td>\n",
       "      <td>[1.0, 1.0, 11.0, 745.0, 0.0, 146.0, 0.0, 185.0...</td>\n",
       "      <td>[1.0, 1.0, 75.0, 4.0, 12.0, 12.0, 0.0, 1770.0,...</td>\n",
       "      <td>[1.0, 1.0, 1.0, 365.0, 0.0, 29.0, 0.0, 25.0, 1...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>18 rows × 1637 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                    0     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                                                 None   \n",
       "followers_count                                                        0   \n",
       "friends_count                                                         60   \n",
       "listed_count                                                           0   \n",
       "created_at                                Sun Feb 19 03:47:42 +0000 2017   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                        10   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               (1.0, 0.0, 0.0, 60.0, 0.0, 0.0, 0.0, 10.0, 1.0...   \n",
       "\n",
       "                                                                    1     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                              https://t.co/qGCGttmwIw   \n",
       "followers_count                                                     5388   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                         178   \n",
       "created_at                                Thu Oct 23 17:37:54 +0000 2014   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                     31751   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               (1.0, 0.0, 5388.0, 0.0, 178.0, 0.0, 0.0, 31751...   \n",
       "\n",
       "                                                                    2     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                               http://t.co/PdagJGqVMR   \n",
       "followers_count                                                      207   \n",
       "friends_count                                                          8   \n",
       "listed_count                                                          21   \n",
       "created_at                                Fri Jul 11 23:18:41 +0000 2014   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      5591   \n",
       "lang                                                               en-gb   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 0.0, 207.0, 8.0, 21.0, 0.0, 0.0, 5591.0,...   \n",
       "\n",
       "                                                                    3     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                               http://t.co/PdagJGIwEp   \n",
       "followers_count                                                       93   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                          24   \n",
       "created_at                                Tue May 26 18:23:01 +0000 2015   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      5145   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               (1.0, 0.0, 93.0, 0.0, 24.0, 0.0, 0.0, 5145.0, ...   \n",
       "\n",
       "                                                                    4     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                                                 None   \n",
       "followers_count                                                      183   \n",
       "friends_count                                                          1   \n",
       "listed_count                                                          16   \n",
       "created_at                                Thu May 19 17:00:07 +0000 2016   \n",
       "favourites_count                                                       1   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      3511   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 0.0, 183.0, 1.0, 16.0, 1.0, 0.0, 3511.0,...   \n",
       "\n",
       "                                                                    5     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                                                 None   \n",
       "followers_count                                                      302   \n",
       "friends_count                                                        249   \n",
       "listed_count                                                           4   \n",
       "created_at                                                8/6/2009 18:16   \n",
       "favourites_count                                                     953   \n",
       "verified                                                               0   \n",
       "statuses_count                                                     17338   \n",
       "lang                                                                  pt   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                              False   \n",
       "has_extended_profile                                               False   \n",
       "name                                                                   1   \n",
       "bot                                                                    0   \n",
       "features               [1.0, 0.0, 302.0, 249.0, 4.0, 953.0, 0.0, 1733...   \n",
       "\n",
       "                                                                    6     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                                                 None   \n",
       "followers_count                                                      136   \n",
       "friends_count                                                          1   \n",
       "listed_count                                                          21   \n",
       "created_at                                Sat Jan 17 01:58:53 +0000 2015   \n",
       "favourites_count                                                       5   \n",
       "verified                                                               0   \n",
       "statuses_count                                                     34446   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 0.0, 136.0, 1.0, 21.0, 5.0, 0.0, 34446.0...   \n",
       "\n",
       "                                                                    7     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                               http://t.co/wIjkIUKTLc   \n",
       "followers_count                                                      134   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                          17   \n",
       "created_at                                Sat Jul 05 12:25:08 +0000 2014   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      7597   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               (1.0, 0.0, 134.0, 0.0, 17.0, 0.0, 0.0, 7597.0,...   \n",
       "\n",
       "                                                                    8     \\\n",
       "screen_name                                                            1   \n",
       "location                                                               0   \n",
       "url                                              https://t.co/b6Jew2xZ5t   \n",
       "followers_count                                                       26   \n",
       "friends_count                                                          0   \n",
       "listed_count                                                           6   \n",
       "created_at                                Fri Apr 08 19:19:00 +0000 2016   \n",
       "favourites_count                                                       0   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      6832   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               (1.0, 0.0, 26.0, 0.0, 6.0, 0.0, 0.0, 6832.0, 1...   \n",
       "\n",
       "                                                                    9     ...  \\\n",
       "screen_name                                                            1  ...   \n",
       "location                                                               0  ...   \n",
       "url                                                                 None  ...   \n",
       "followers_count                                                       67  ...   \n",
       "friends_count                                                          0  ...   \n",
       "listed_count                                                          19  ...   \n",
       "created_at                                                7/22/2014 3:42  ...   \n",
       "favourites_count                                                       0  ...   \n",
       "verified                                                               0  ...   \n",
       "statuses_count                                                    120232  ...   \n",
       "lang                                                                  en  ...   \n",
       "status                                                                 1  ...   \n",
       "default_profile                                                        0  ...   \n",
       "default_profile_image                                               None  ...   \n",
       "has_extended_profile                                                None  ...   \n",
       "name                                                                   1  ...   \n",
       "bot                                                                  NaN  ...   \n",
       "features               (1.0, 0.0, 67.0, 0.0, 19.0, 0.0, 0.0, 120232.0...  ...   \n",
       "\n",
       "                                                                    1627  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                                                 None   \n",
       "followers_count                                                        1   \n",
       "friends_count                                                        367   \n",
       "listed_count                                                           0   \n",
       "created_at                                               1/31/2016 21:35   \n",
       "favourites_count                                                      29   \n",
       "verified                                                               0   \n",
       "statuses_count                                                        37   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 1.0, 1.0, 367.0, 0.0, 29.0, 0.0, 37.0, 1...   \n",
       "\n",
       "                                                                    1628  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                              https://t.co/2SZUeRSMor   \n",
       "followers_count                                                    50441   \n",
       "friends_count                                                        197   \n",
       "listed_count                                                         894   \n",
       "created_at                                Thu Mar 12 06:24:54 +0000 2009   \n",
       "favourites_count                                                    2783   \n",
       "verified                                                               1   \n",
       "statuses_count                                                      4652   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 1.0, 50441.0, 197.0, 894.0, 2783.0, 1.0,...   \n",
       "\n",
       "                                                                    1629  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                               http://t.co/i0zlHsC1xM   \n",
       "followers_count                                                   435871   \n",
       "friends_count                                                       2337   \n",
       "listed_count                                                        4662   \n",
       "created_at                                Wed Feb 04 20:07:27 +0000 2009   \n",
       "favourites_count                                                     575   \n",
       "verified                                                               1   \n",
       "statuses_count                                                     27102   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 1.0, 435871.0, 2337.0, 4662.0, 575.0, 1....   \n",
       "\n",
       "                                                                    1630  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                                                 None   \n",
       "followers_count                                                    24244   \n",
       "friends_count                                                      24852   \n",
       "listed_count                                                         742   \n",
       "created_at                                               3/18/2009 15:09   \n",
       "favourites_count                                                      26   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      1568   \n",
       "lang                                                                  pt   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                              False   \n",
       "has_extended_profile                                               False   \n",
       "name                                                                   1   \n",
       "bot                                                                    0   \n",
       "features               [1.0, 1.0, 24244.0, 24852.0, 742.0, 26.0, 0.0,...   \n",
       "\n",
       "                                                                    1631  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                              https://t.co/GHrUZJLiqw   \n",
       "followers_count                                                    62647   \n",
       "friends_count                                                        277   \n",
       "listed_count                                                        1197   \n",
       "created_at                                Sun Aug 23 18:54:32 +0000 2009   \n",
       "favourites_count                                                     431   \n",
       "verified                                                               1   \n",
       "statuses_count                                                     11885   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 1.0, 62647.0, 277.0, 1197.0, 431.0, 1.0,...   \n",
       "\n",
       "                                                                    1632  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                                                 None   \n",
       "followers_count                                                     1628   \n",
       "friends_count                                                          4   \n",
       "listed_count                                                           7   \n",
       "created_at                                Thu Aug 27 15:54:12 +0000 2015   \n",
       "favourites_count                                                       3   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      5155   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 1.0, 1628.0, 4.0, 7.0, 3.0, 0.0, 5155.0,...   \n",
       "\n",
       "                                                                    1633  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                                                 None   \n",
       "followers_count                                                        9   \n",
       "friends_count                                                        462   \n",
       "listed_count                                                           0   \n",
       "created_at                                              10/29/2016 22:46   \n",
       "favourites_count                                                      84   \n",
       "verified                                                               0   \n",
       "statuses_count                                                        72   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        1   \n",
       "default_profile_image                                              False   \n",
       "has_extended_profile                                               False   \n",
       "name                                                                   1   \n",
       "bot                                                                    1   \n",
       "features               [1.0, 1.0, 9.0, 462.0, 0.0, 84.0, 0.0, 72.0, 1...   \n",
       "\n",
       "                                                                    1634  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                                                 None   \n",
       "followers_count                                                       11   \n",
       "friends_count                                                        745   \n",
       "listed_count                                                           0   \n",
       "created_at                                                1/1/2015 17:44   \n",
       "favourites_count                                                     146   \n",
       "verified                                                               0   \n",
       "statuses_count                                                       185   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                              False   \n",
       "has_extended_profile                                               False   \n",
       "name                                                                   1   \n",
       "bot                                                                    1   \n",
       "features               [1.0, 1.0, 11.0, 745.0, 0.0, 146.0, 0.0, 185.0...   \n",
       "\n",
       "                                                                    1635  \\\n",
       "screen_name                                                            1   \n",
       "location                                                               1   \n",
       "url                                               http://t.co/QD54bIMDmT   \n",
       "followers_count                                                       75   \n",
       "friends_count                                                          4   \n",
       "listed_count                                                          12   \n",
       "created_at                                Tue Sep 29 22:01:10 +0000 2015   \n",
       "favourites_count                                                      12   \n",
       "verified                                                               0   \n",
       "statuses_count                                                      1770   \n",
       "lang                                                                  en   \n",
       "status                                                                 1   \n",
       "default_profile                                                        0   \n",
       "default_profile_image                                               None   \n",
       "has_extended_profile                                                None   \n",
       "name                                                                   1   \n",
       "bot                                                                  NaN   \n",
       "features               [1.0, 1.0, 75.0, 4.0, 12.0, 12.0, 0.0, 1770.0,...   \n",
       "\n",
       "                                                                    1636  \n",
       "screen_name                                                            1  \n",
       "location                                                               1  \n",
       "url                                                                 None  \n",
       "followers_count                                                        1  \n",
       "friends_count                                                        365  \n",
       "listed_count                                                           0  \n",
       "created_at                                               1/30/2016 21:26  \n",
       "favourites_count                                                      29  \n",
       "verified                                                               0  \n",
       "statuses_count                                                        25  \n",
       "lang                                                                  en  \n",
       "status                                                                 1  \n",
       "default_profile                                                        1  \n",
       "default_profile_image                                              False  \n",
       "has_extended_profile                                               False  \n",
       "name                                                                   1  \n",
       "bot                                                                    1  \n",
       "features               [1.0, 1.0, 1.0, 365.0, 0.0, 29.0, 0.0, 25.0, 1...  \n",
       "\n",
       "[18 rows x 1637 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyspark.ml.feature import VectorAssembler\n",
    "\n",
    "train = df_train.drop('description')\n",
    "vecAssembler = VectorAssembler(inputCols=['screen_name','location','followers_count','friends_count','listed_count','favourites_count','verified','statuses_count','status','default_profile','name'], outputCol=\"features\", handleInvalid = \"skip\")\n",
    "df_train = vecAssembler.transform(train)\n",
    "df_train.toPandas().transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "✅ **Task :** \n",
    "\n",
    "To run ML training phase on the scalar vector you need to create DataFrame out of it.\n",
    "\n",
    "bot and features are the **only** columns that we care about.\n",
    "\n",
    "Do it with drop function:\n",
    "    \n",
    "```python\n",
    "\n",
    "output = df_train.drop(\"val1\",\"val2\")\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------------------+--------------------+-----+---------------------+--------------------+----+--------------------+\n",
      "|                 url|          created_at| lang|default_profile_image|has_extended_profile| bot|            features|\n",
      "+--------------------+--------------------+-----+---------------------+--------------------+----+--------------------+\n",
      "|                null|Sun Feb 19 03:47:...|   en|                 null|                null|null|(11,[0,3,7,8,10],...|\n",
      "|https://t.co/qGCG...|Thu Oct 23 17:37:...|   en|                 null|                null|null|(11,[0,2,4,7,8,10...|\n",
      "|http://t.co/PdagJ...|Fri Jul 11 23:18:...|en-gb|                 null|                null|null|[1.0,0.0,207.0,8....|\n",
      "|http://t.co/PdagJ...|Tue May 26 18:23:...|   en|                 null|                null|null|(11,[0,2,4,7,8,10...|\n",
      "|                null|Thu May 19 17:00:...|   en|                 null|                null|null|[1.0,0.0,183.0,1....|\n",
      "|                null|      8/6/2009 18:16|   pt|                false|               false|   0|[1.0,0.0,302.0,24...|\n",
      "|                null|Sat Jan 17 01:58:...|   en|                 null|                null|null|[1.0,0.0,136.0,1....|\n",
      "|http://t.co/wIjkI...|Sat Jul 05 12:25:...|   en|                 null|                null|null|(11,[0,2,4,7,8,10...|\n",
      "|https://t.co/b6Je...|Fri Apr 08 19:19:...|   en|                 null|                null|null|(11,[0,2,4,7,8,10...|\n",
      "|                null|      7/22/2014 3:42|   en|                 null|                null|null|(11,[0,2,4,7,8,10...|\n",
      "|                null|Sat Feb 28 13:58:...|   en|                 null|                null|null|[1.0,0.0,27.0,326...|\n",
      "|http://t.co/DPeRq...|Fri May 16 01:03:...|   en|                 null|                null|null|[1.0,0.0,74704.0,...|\n",
      "|                null|Tue Dec 29 21:18:...|   en|                 null|                null|null|[1.0,0.0,1227898....|\n",
      "|                null|Thu Mar 24 00:32:...|   en|                 null|                null|null|(11,[0,2,4,7,8,10...|\n",
      "|http://t.co/t34fK...|Mon Oct 07 23:44:...|   en|                 null|                null|null|[1.0,0.0,119.0,1....|\n",
      "|                null|Mon Jan 30 23:39:...|   en|                 null|                null|null|[1.0,0.0,25.0,897...|\n",
      "|                null|Thu Jun 04 14:33:...|   en|                 null|                null|null|[1.0,0.0,102.0,21...|\n",
      "|http://t.co/SRwiZ...|      7/1/2013 13:34|en-gb|                false|               false|   1|[1.0,0.0,741.0,3....|\n",
      "|                null|Fri Jan 17 01:37:...|   en|                 null|                null|null|[1.0,0.0,185.0,1....|\n",
      "|                null|       6/9/2011 0:42|   en|                 null|                null|null|[1.0,0.0,358.0,50...|\n",
      "+--------------------+--------------------+-----+---------------------+--------------------+----+--------------------+\n",
      "only showing top 20 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "output_train = df_train.drop('screen_name','location','followers_count','friends_count','listed_count','favourites_count','verified','statuses_count','status','default_profile','name')\n",
    "output_train.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After turning _numeric_ columns into one `features` column and dropping `description`:\n",
    "\n",
    "We got left with creating `label` column:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a new DataFrame with `label` column:\n",
    "Code sample:\n",
    "```python\n",
    "df_for_lr  = output_train.selectExpr(\"features\", \"bot as label\")\n",
    "df_for_lr.show()\n",
    "\n",
    "df_for_lr.toPandas().transpose()\n",
    "```\n",
    "\n",
    "\n",
    "Notice that now `df_for_lr` is your new DataFrame for creating `LinearRegression` model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------------------+-----+\n",
      "|            features|label|\n",
      "+--------------------+-----+\n",
      "|(11,[0,3,7,8,10],...|    0|\n",
      "|(11,[0,2,4,7,8,10...|    0|\n",
      "|[1.0,0.0,207.0,8....|    0|\n",
      "|(11,[0,2,4,7,8,10...|    0|\n",
      "|[1.0,0.0,183.0,1....|    0|\n",
      "|[1.0,0.0,302.0,24...|    0|\n",
      "|[1.0,0.0,136.0,1....|    0|\n",
      "|(11,[0,2,4,7,8,10...|    0|\n",
      "|(11,[0,2,4,7,8,10...|    0|\n",
      "|(11,[0,2,4,7,8,10...|    0|\n",
      "|[1.0,0.0,27.0,326...|    0|\n",
      "|[1.0,0.0,74704.0,...|    0|\n",
      "|[1.0,0.0,1227898....|    0|\n",
      "|(11,[0,2,4,7,8,10...|    0|\n",
      "|[1.0,0.0,119.0,1....|    0|\n",
      "|[1.0,0.0,25.0,897...|    0|\n",
      "|[1.0,0.0,102.0,21...|    0|\n",
      "|[1.0,0.0,741.0,3....|    1|\n",
      "|[1.0,0.0,185.0,1....|    0|\n",
      "|[1.0,0.0,358.0,50...|    0|\n",
      "+--------------------+-----+\n",
      "only showing top 20 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "df_for_lr  = output_train.selectExpr(\"features\", \"bot as label\")\n",
    "test = df_for_lr.fillna({'label':0})\n",
    "\n",
    "df_for_lr  = test\n",
    "df_for_lr.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the next code.\n",
    "Check out where you use the new DataFrame - `df_for_lr`\n",
    "\n",
    "It creates a machine learning model out of linear regression.\n",
    "\n",
    "Tweak the `maxIter`,`regParam` and `elasticNetParam` !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Coefficients: [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]\n",
      "Intercept: 0.10201588271227856\n",
      "numIterations: 1\n",
      "objectiveHistory: [0.5000000000000001]\n",
      "+--------------------+\n",
      "|           residuals|\n",
      "+--------------------+\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "|  0.8979841172877214|\n",
      "|-0.10201588271227856|\n",
      "|-0.10201588271227856|\n",
      "+--------------------+\n",
      "only showing top 20 rows\n",
      "\n",
      "RMSE: 0.302669\n",
      "r2: 0.000000\n"
     ]
    }
   ],
   "source": [
    "from pyspark.ml.regression import LinearRegression\n",
    "\n",
    "\n",
    "lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)\n",
    "\n",
    "# Fit the model\n",
    "lrModel = lr.fit(df_for_lr)\n",
    "\n",
    "# Print the coefficients and intercept for linear regression\n",
    "print(\"Coefficients: %s\" % str(lrModel.coefficients))\n",
    "print(\"Intercept: %s\" % str(lrModel.intercept))\n",
    "\n",
    "# Summarize the model over the training set and print out some metrics\n",
    "trainingSummary = lrModel.summary\n",
    "print(\"numIterations: %d\" % trainingSummary.totalIterations)\n",
    "print(\"objectiveHistory: %s\" % str(trainingSummary.objectiveHistory))\n",
    "trainingSummary.residuals.show()\n",
    "print(\"RMSE: %f\" % trainingSummary.rootMeanSquaredError)\n",
    "print(\"r2: %f\" % trainingSummary.r2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What did you get?\n",
    "\n",
    "How does it look like?\n",
    "\n",
    "What is `r2`? `r2` is a shortcut for R Square: \n",
    ">R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.\n",
    "\n",
    "What is RMSE?\n",
    "> Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.\n",
    "\n",
    "Try to play with the parameters and watch how they change."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "RMSE alone is meaningless until we compare with the actual `label` value, such as mean, min and max. \n",
    "After such comparison, our RMSE looks pretty good.\n",
    "\n",
    "Compare `RMSE` and `mean` output.\n",
    "After such comparison, our RMSE looks pretty good.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-------+-------------------+\n",
      "|summary|              label|\n",
      "+-------+-------------------+\n",
      "|  count|               1637|\n",
      "|   mean|0.10201588271227856|\n",
      "| stddev| 0.3027616849758149|\n",
      "|    min|                  0|\n",
      "|    max|                  1|\n",
      "+-------+-------------------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "df_for_lr.describe().show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You built 2 machine learning models!\n",
    "\n",
    "However, didn't get the best results.\n",
    "\n",
    "It's OK!\n",
    "\n",
    "And absolutly normal.\n",
    "\n",
    "In chapter 4 you learn how to improve it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### To load the models later, Save them to file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "lrModel.save(\"linearRegression_model\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "fpGrowth_model.save(\"fpGrowth_model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Well Done! 👏👏👏\n",
    "## You just finished: Apache Spark ML and create machine learning models¶\n",
    "## Next Chapter: Evaluating ML models and using pipelines "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
